Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPL: add SendingPolicy for the case destination is expendable #12676

Merged
merged 2 commits into from
Feb 10, 2024

Conversation

ktf
Copy link
Member

@ktf ktf commented Feb 8, 2024

DPL: add SendingPolicy for the case destination is expendable


Stack created with Sapling. Best reviewed with ReviewStack.

@ktf
Copy link
Member Author

ktf commented Feb 8, 2024

@Barthelemy @knopers8 @davidrohr @martenole this makes my standalone test behave correctly when an expendable task is killed. Indeed the oldest possible timeframe needs to use the policy mechanism and we need an explicit policy for dropping data not going to expendables. In principle this also allows things which are not "Dispatcher"s to be before expendable task.

@ktf ktf force-pushed the pr12676 branch 3 times, most recently from 0e96e7b to c614f05 Compare February 8, 2024 12:57
@Barthelemy
Copy link
Collaborator

Thanks !

@ktf
Copy link
Member Author

ktf commented Feb 8, 2024

A couple of tests seems to be broken, actually. I am looking into it.

@ktf
Copy link
Member Author

ktf commented Feb 8, 2024

Ok, I understand what is going on. Fix coming later.

@alibuild
Copy link
Collaborator

alibuild commented Feb 8, 2024

Error while checking build/O2/fullCI for c614f05 at 2024-02-09 13:29:

## sw/BUILD/O2-latest/log
c++: error: unrecognized command-line option '--rtlib=compiler-rt'
c++: error: unrecognized command-line option '--rtlib=compiler-rt'


## sw/BUILD/o2checkcode-latest/log
--
========== List of errors found ==========
++ GRERR=0
++ grep -v clang-diagnostic-error error-log.txt
++ grep ' error:'
/sw/SOURCES/O2/12676-slc8_x86-64/0/Framework/DataInspector/src/DataInspectorService.cxx:208:11: error: namespace 'o2::framework::DataInspector' does not follow the underscore convention [aliceO2-namespace-naming]
/sw/SOURCES/O2/12676-slc8_x86-64/0/Framework/DataInspector/src/DataInspectorService.cxx:241:36: error: namespace 'o2::framework::DataInspector' does not follow the underscore convention [aliceO2-namespace-naming]
/sw/SOURCES/O2/12676-slc8_x86-64/0/Framework/DataInspector/src/DataInspector.cxx:47:26: error: namespace 'o2::framework::DataInspector' does not follow the underscore convention [aliceO2-namespace-naming]
++ [[ 0 == 0 ]]
++ exit 1
--

Full log here.

@martenole
Copy link
Contributor

Hi @ktf I can reproduce the crash of the fullCI locally. Without this PR its working fine. Could you please take a look?

@alibuild
Copy link
Collaborator

alibuild commented Feb 9, 2024

Error while checking build/O2/fullCI for 29ce029 at 2024-02-10 00:28:

## sw/BUILD/O2-latest/log
c++: error: unrecognized command-line option '--rtlib=compiler-rt'
c++: error: unrecognized command-line option '--rtlib=compiler-rt'


## sw/BUILD/O2-full-system-test-latest/log
command /sw/slc8_x86-64/O2/12676-slc8_x86-64-local1/prodtests/full-system-test/dpl-workflow.sh had nonzero exit code 128
[56061:internal-dpl-ccdb-backend]: Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Unable to open /eos/alice/cond/03/03568/189ef4a1-3a84-11ee-a1e3-08f1eaf0250c; Permission denied
[56061:internal-dpl-ccdb-backend]: Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Unable to open /eos/alice/cond/13/23867/6be6113d-8565-11ee-82d7-08f1eaf0250c; Permission denied
[56061:internal-dpl-ccdb-backend]: Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Unable to open /eos/alice/cond/10/53949/ee079921-3fcd-11ed-8727-08f1eaf0250c; Permission denied
[56061:internal-dpl-ccdb-backend]: Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Unable to open /eos/alice/cond/05/50377/c9ab44c0-9fe3-11ec-975c-08f1eaf024ee; Permission denied
[56061:internal-dpl-ccdb-backend]: Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Unable to open /eos/alice/cond/15/00514/fd4f6e74-d4ae-11ec-90c5-08f1eaf0250c; Permission denied
[56061:internal-dpl-ccdb-backend]: Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Unable to open /eos/alice/cond/11/01024/22a69fd7-fafa-11ed-9692-08f1eaf0250c; Permission denied
[56061:internal-dpl-ccdb-backend]: Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Unable to open /eos/alice/cond/14/31594/fc03e20a-4690-11ed-a67e-08f1eaf024ee; Permission denied
[56061:internal-dpl-ccdb-backend]: Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Unable to open /eos/alice/cond/08/54749/e815dd7d-06d8-11ee-b4d8-08f1eaf024ee; Permission denied
[56061:internal-dpl-ccdb-backend]: Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Unable to open /eos/alice/cond/14/23401/8e0f2f5c-e6d1-11ed-b8b9-08f1eaf024ee; Permission denied
[ERROR] Workflow crashed - PID 56212 (TRD-RawData-proxy) did not exit correctly however it's not clear why. Exit code forced to 128.
[56061:internal-dpl-ccdb-backend]: Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Unable to open /eos/alice/cond/03/44207/decac431-06d8-11ee-b4d8-08f1eaf024ee; Permission denied
[56061:internal-dpl-ccdb-backend]: Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Unable to open /eos/alice/cond/00/32367/4efc04f6-2ec7-11ed-8727-08f1eaf0250c; Permission denied
[56061:internal-dpl-ccdb-backend]: Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Unable to open /eos/alice/cond/15/17990/9c6ec40b-2c77-11ed-ac0b-08f1eaf024ee; Permission denied
[ERROR]  - Device TRD-RawData-proxy: pid 56212 (exit 128)
[ERROR] SEVERE: Device TRD-RawData-proxy (56212) returned with 128


## sw/BUILD/o2checkcode-latest/log
--
========== List of errors found ==========
++ GRERR=0
++ grep -v clang-diagnostic-error error-log.txt
++ grep ' error:'
++ GRERR=1
++ [[ 1 == 0 ]]
++ mkdir -p /sw/INSTALLROOT/03b3165a95f8a5fcaee2c5a1e2e1ab1ba247d260/slc8_x86-64/o2checkcode/1.0-local1088/etc/modulefiles
++ cat
--

Full log here.

@alibuild
Copy link
Collaborator

alibuild commented Feb 10, 2024

Error while checking build/O2/fullCI for 37b4c43 at 2024-02-10 07:24:

## sw/BUILD/o2checkcode-latest/log
--
========== List of errors found ==========
++ GRERR=0
++ grep -v clang-diagnostic-error error-log.txt
++ grep ' error:'
/sw/SOURCES/O2/12676-slc8_x86-64/0/Framework/DataInspector/src/DataInspector.cxx:47:26: error: namespace 'o2::framework::DataInspector' does not follow the underscore convention [aliceO2-namespace-naming]
/sw/SOURCES/O2/12676-slc8_x86-64/0/Framework/DataInspector/src/DataInspectorService.cxx:208:11: error: namespace 'o2::framework::DataInspector' does not follow the underscore convention [aliceO2-namespace-naming]
/sw/SOURCES/O2/12676-slc8_x86-64/0/Framework/DataInspector/src/DataInspectorService.cxx:241:36: error: namespace 'o2::framework::DataInspector' does not follow the underscore convention [aliceO2-namespace-naming]
++ [[ 0 == 0 ]]
++ exit 1
--

Full log here.

Copy link
Contributor

@martenole martenole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I don't see any errors anymore. But with this included we get backpressure on the EPNs after some time of running. Is some negative performance impact expected due to this?

@martenole
Copy link
Contributor

Nevermind. This was due to additional processing steps

@martenole martenole merged commit 976e460 into AliceO2Group:dev Feb 10, 2024
18 checks passed
@ktf ktf deleted the pr12676 branch February 11, 2024 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants